b'TREASURY INSPECTOR GENERAL FOR TAX ADMINISTRATION\n\n\n\n\n                 Customer Account Data Engine 2 Database\n                  Validation Is Progressing; However, Data\n                   Coverage, Data Defect Reporting, and\n                     Documentation Need Improvement\n\n\n\n                                     September 29, 2014\n\n                             Reference Number: 2014-20-063\n\n\n\n\nThis report has cleared the Treasury Inspector General for Tax Administration disclosure review process\n and information determined to be restricted from public release has been redacted from this document.\n\n\n\nPhone Number / 202-622-6500\nE-mail Address / TIGTACommunications@tigta.treas.gov\nWebsite        / http://www.treasury.gov/tigta\n\x0c                                                 HIGHLIGHTS\n\n\nCUSTOMER ACCOUNT DATA ENGINE 2                       percentage of the data fields are validated with\nDATABASE VALIDATION IS                               automated data compare tools, there is no\nPROGRESSING; HOWEVER, DATA                           documented plan to ensure that data fields\nCOVERAGE, DATA DEFECT                                validated using other means are validated\n                                                     periodically. The data sampling methodology for\nREPORTING, AND DOCUMENTATION\n                                                     validating CADE 2 data is sound. The IRS\nNEED IMPROVEMENT                                     developed a data sampling methodology to\n                                                     enable maximum data validation coverage by\n\nHighlights                                           using a statistical sample, but key activities were\n                                                     not documented. After discussing the need to\n                                                     document the data sampling methodology, the\nFinal Report issued on                               IRS began development of the documentation.\nSeptember 29, 2014                                   Several in-progress documents were provided\n                                                     for our review.\nHighlights of Reference Number: 2014-20-063\nto the Internal Revenue Service Chief                The IRS developed a Data Quality Scorecard to\nTechnology Officer.                                  track progress in meeting data quality success\n                                                     criteria. However, the processes needed to\nIMPACT ON TAXPAYERS                                  effectively perform these activities were not\n                                                     sufficiently documented. As a result, some of\nThere is significant effort underway to ensure       the metrics were initially incorrectly reported.\nthe accuracy of individual taxpayer account data\non the Customer Account Data Engine 2                WHAT TIGTA RECOMMENDED\n(CADE 2) database. This effort is an important\npart of its implementation because inaccurate        TIGTA recommended that the Chief Technology\ndata could delay this database from becoming         Officer ensure that: 1) data validation test\nthe authoritative source of data, thereby            results are maintained and available for data\nincreasing the cost of implementation.               fields not validated by automated data compare\n                                                     tools; 2) data validation plans include\nWHY TIGTA DID THE AUDIT                              periodically validating the data fields that are not\n                                                     validated with automated data compare tools;\nThis review was part of our Fiscal Year 2014         3) all data sampling processes are completely\nAnnual Audit Plan and addresses the major            documented; 4) details needed for determining\nmanagement challenge of Modernization. The           the Data Quality Scorecard metrics are\noverall audit objective was to evaluate IRS          completely documented; 5) all documentation\nefforts to ensure that the data in the CADE 2        needed to verify the data in the Data Quality\ndatabase are accurate and complete.                  Scorecard is stored for future reference;\nThe IRS requested that TIGTA evaluate the new        6) automated data compare tools identify and\ndata validation testing methodology. TIGTA           report on data fields, not field identifier numbers;\nperformed this audit during the data validation      and 7) automated data compare tool reports\ntesting process and provided the IRS with            clearly identify counters and align with data\nrecommendations for continuous improvement.          validation metrics.\n\nWHAT TIGTA FOUND                                     The IRS agreed with six of the report\xe2\x80\x99s seven\n                                                     recommendations. The IRS plans to maintain\nData validation efforts were efficiently performed   results for manual data validation activities,\ndue to adequate planning and resource                validate changes to the data fields that are not\ncoordination. For example, detailed data             validated with automated data compare tools,\nvalidation plans ensured that test activities were   develop documentation on the procedures to\non track and a new process ensured that data         collect and maintain data used to support data\ndefects were effectively managed.                    validation metrics and the Scorecard\n                                                     development process, and store Scorecard\nThe IRS identified the data fields to be verified\n                                                     source documentation.\nand how each would be validated. While a large\n\x0c                                            DEPARTMENT OF THE TREASURY\n                                                 WASHINGTON, D.C. 20220\n\n\n\n\nTREASURY INSPECTOR GENERAL\n  FOR TAX ADMINISTRATION\n\n\n\n\n                                         September 29, 2014\n\n\n MEMORANDUM FOR CHIEF TECHNOLOGY OFFICER\n\n\n FROM:                       Michael E. McKenney\n                             Deputy Inspector General for Audit\n\n SUBJECT:                    Final Audit Report \xe2\x80\x93 Customer Account Data Engine 2 Database\n                             Validation Is Progressing; However, Data Coverage, Data Defect\n                             Reporting, and Documentation Need Improvement\n                             (Audit # 201320030)\n\n This report presents the results of our review of the Customer Account Data Engine 2 data\n validation efforts. The overall objective of this review was to evaluate Internal Revenue Service\n (IRS) efforts to ensure that the data in the Customer Account Data Engine 2 (CADE 2) database\n are accurate and complete. This review is included in the Treasury Inspector General for Tax\n Administration\xe2\x80\x99s Fiscal Year 2014 Annual Audit Plan and addresses the major management\n challenge of Modernization.\n While we are in general agreement with the IRS\xe2\x80\x99s response, one area of disagreement is whether\n CADE 2 Transition State 1.5 should be closed. We believe it should not be closed because, as of\n June 2014, only 68 percent of logic paths and 81 percent of field identifiers had been validated,\n and data defects were identified. There is a significant risk that additional defects will be\n identified as data validation continues. Therefore, we believe that CADE 2 Transition State 1.5\n should remain open until several consecutive data validation cycles are completed with no new\n data defects identified.\n Management\xe2\x80\x99s complete response to the draft report is included in Appendix VI.\n Copies of this report are also being sent to the IRS managers affected by the report\n recommendations. If you have any questions, please contact me or Danny R. Verneuille, Acting\n Assistant Inspector General for Audit (Security and Information Technology Services).\n\x0c                              Customer Account Data Engine 2 Database Validation\n                              Is Progressing; However, Data Coverage, Data Defect\n                                Reporting, and Documentation Need Improvement\n\n\n\n                                             Table of Contents\nBackground .......................................................................................................... Page 1\nResults of Review ............................................................................................... Page 4\n          Data Validation Efforts Were Performed Efficiently\n          Due to Adequate Planning and Resource Coordination ............................... Page 4\n          The CADE 2 Program Management Office Identified\n          the Data Fields to Be Verified and How Each Would Be\n          Validated; However, All Data Fields Are Not Being\n          Periodically Validated ................................................................................... Page 5\n                    Recommendations 1 and 2: .............................................. Page 8\n\n          The Data Sampling Methodology for Validating\n          CADE 2 Data Is Sound; However, Key Processes in\n          the Implementation of the Methodology Need to Be\n          Documented .................................................................................................. Page 9\n                    Recommendation 3:........................................................ Page 14\n\n          The Documentation and Processes for Determining the\n          Data Quality Scorecard Metrics Need Improvement .................................... Page 14\n                    Recommendations 4 and 5: .............................................. Page 20\n\n          The Field Identifier Compare Tool Validates Data for\n          Downstream Systems, but Data Discrepancy Reports\n          Need Improvement........................................................................................ Page 20\n                    Recommendation 6:........................................................ Page 22\n                    Recommendation 7:........................................................ Page 23\n\nAppendices\n          Appendix I \xe2\x80\x93 Detailed Objective, Scope, and Methodology ........................ Page 24\n          Appendix II \xe2\x80\x93 Major Contributors to This Report ........................................ Page 26\n          Appendix III \xe2\x80\x93 Report Distribution List ....................................................... Page 27\n          Appendix IV \xe2\x80\x93 Data Quality Scorecards ...................................................... Page 28\n          Appendix V \xe2\x80\x93 Glossary of Terms ................................................................. Page 32\n          Appendix VI \xe2\x80\x93 Management\xe2\x80\x99s Response to the Draft Report ...................... Page 35\n\x0c         Customer Account Data Engine 2 Database Validation\n         Is Progressing; However, Data Coverage, Data Defect\n           Reporting, and Documentation Need Improvement\n\n\n\n\n                     Abbreviations\n\nCADE 2        Customer Account Data Engine 2\nEDMO          Enterprise Data Management Office\nFLID          Field Identifier\nIMF           Individual Master File\nIRS           Internal Revenue Service\nIT            Information Technology\nKISAM         Knowledge, Incident/Problem, Service Asset Management\nKPI           Key Performance Indicators\nPMO           Program Management Office\nTIGTA         Treasury Inspector General for Tax Administration\n\x0c                            Customer Account Data Engine 2 Database Validation\n                            Is Progressing; However, Data Coverage, Data Defect\n                              Reporting, and Documentation Need Improvement\n\n\n\n\n                                              Background\n\nThe Customer Account Data Engine1 2 (CADE 2) Program\nis one of the top information technology modernization            In addition to standard testing\nprojects in the Internal Revenue Service (IRS). The               procedures, several tools and\nCADE 2 mission is to provide state-of-the-art individual             methodologies have been\ntaxpayer account processing and data-centric technologies           identified and developed to\n                                                                      validate the quality and\nto improve service to taxpayers and enhance tax                        integrity of the data.\nadministration. The CADE 2 database will replace the\ncurrent Individual Master File (IMF) account settlement\nsystem with a relational database processing system and become a key component in the IRS\xe2\x80\x99s\nenterprise-wide, data-centric information technology strategy. Implementation of the CADE 2\ndatabase (Database Implementation) to support this objective has introduced a greater potential\nfor data anomalies due to a complex infrastructure, the complexity of tax processing, and the\nintroduction of a new relational database. As such, there is a need for a comprehensive plan for\nensuring the quality and integrity of the data within the CADE 2 database and the data provided\nto downstream systems. In addition to standard testing procedures, several tools and\nmethodologies have been identified and developed to validate the quality and integrity of the\ndata and to identify anomalies within the data.\nIn March 2013, in its definition of \xe2\x80\x9cauthoritative source,\xe2\x80\x9d the IRS Chief Counsel stated that if the\ndata in CADE 2 are used as evidence of the transactions in the taxpayer\xe2\x80\x99s account, the\ninformation obtained from CADE 2 must be identical to the IMF at any given point in time.\nOn November 5, 2012, the CADE 2 Executive Steering Committee approved a conditional\nCADE 2 Transition State 1 Milestone 5 exit with two conditions. On April 4, 2013, the CADE 2\nExecutive Steering Committee closed the November 2012 Milestone 5 exit conditions and\nopened 2 new Exit conditions \xe2\x80\x93 one of which was for Data Assurance: 1) Data Assurance \xe2\x80\x93\n\xe2\x80\x9cGetting the Data Right\xe2\x80\x9d and 2) Robust and Sustainable System Performance and Operational\nReadiness. These exit conditions are now being tracked by the IRS as Transition State 1.5. The\ncriteria for closing the Data Assurance conditions are:\n      \xef\x82\xb7    Verification of a statistically sound sample (911 data fields against 270 million taxpayer\n           accounts) of data in the CADE 2 database with no Priority 1/Priority 2 data defect tickets.\n      \xef\x82\xb7    Ability to scale data assurance tools to perform high-volume testing in time to test within\n           filing season test windows.\n\n\n\n1\n    See Appendix V for a glossary of terms.\n                                                                                               Page 1\n\x0c                        Customer Account Data Engine 2 Database Validation\n                        Is Progressing; However, Data Coverage, Data Defect\n                          Reporting, and Documentation Need Improvement\n\n\n\n    \xef\x82\xb7   Minimal (risk-based decision) code defects that could cause data defects downstream\n        resulting in the need to use data correction tools.\nThe criteria for closing the Robust and Sustainable System Performance and Operational\nReadiness conditions are:\n    \xef\x82\xb7   Address identified system performance concerns.\n    \xef\x82\xb7   Meet organizational and operational readiness objectives.\n    \xef\x82\xb7   Meet and exceed system performance targets for database processing within budgeted\n        time frames in production.\nOver the past two years, the Treasury Inspector General for Tax Administration (TIGTA)\nreported on the progress of the CADE 2 Database Implementation. In September 2012, we\nreported that the IRS had data integrity checks in place at several levels of the CADE 2 database.\nDespite these controls and their data integrity testing efforts, the IRS could not ensure that the\ndata on the CADE 2 database were consistently accurate and complete at the data field level due\nto the complexity of many of the data transformation rules and embedded business logic\ncontained within IMF data fields.2\nIn September 2013, TIGTA reported that the CADE 2 database could not be used as a trusted\nsource for downstream systems due to the 2.4 million data corrections that had to be applied to\nthe CADE 2 database and the IRS\xe2\x80\x99s inability to evaluate 431 CADE 2 database columns of data\nfor data accuracy. During the audit, the IRS was in the process of developing additional tools\nand implementing a new data validation testing methodology intended to achieve timeliness,\naccuracy, integrity, validity, reasonableness, completeness, and uniqueness.\nThe IRS requested that TIGTA evaluate the new data validation testing methodology. TIGTA\nagreed to do so3 and performed this audit during the data validation testing process and provided\nthe IRS with recommendations for continuous improvement. During fieldwork, the IRS took\nimmediate steps to address concerns identified by TIGTA. Most of these actions are noted in the\nManagement Action statements later in the report.\nThis review was performed at the IRS Information Technology (IT) organization\xe2\x80\x99s offices in\nLanham, Maryland, during the period August 2013 through May 2014. We conducted this\nperformance audit in accordance with generally accepted government auditing standards. Those\nstandards require that we plan and perform the audit to obtain sufficient, appropriate evidence to\nprovide a reasonable basis for our findings and conclusions based on our audit objective. We\n\n2\n  TIGTA, Ref. No. 2012-20-109, The Customer Account Data Engine 2 Database Was Initialized; However,\nDatabase and Security Risks Remain, and Initial Timeframes to Provide Data to Three Downstream Systems May\nNot Be Met pp. 3\xe2\x80\x934 (Sept. 2012).\n3\n  TIGTA, Ref. No. 2013-20-125, Customer Account Data Engine 2 Database Deployment Is Experiencing Delays\nand Increased Costs pp. 7\xe2\x80\x9310 (Sept. 2013).\n                                                                                                     Page 2\n\x0c                      Customer Account Data Engine 2 Database Validation\n                      Is Progressing; However, Data Coverage, Data Defect\n                        Reporting, and Documentation Need Improvement\n\n\n\nbelieve that the evidence obtained provides a reasonable basis for our findings and conclusions\nbased on our audit objectives. Detailed information on our audit objective, scope, and\nmethodology is presented in Appendix I. Major contributors to the report are listed in\nAppendix II.\n\n\n\n\n                                                                                          Page 3\n\x0c                      Customer Account Data Engine 2 Database Validation\n                      Is Progressing; However, Data Coverage, Data Defect\n                        Reporting, and Documentation Need Improvement\n\n\n\n\n                                 Results of Review\n\nData Validation Efforts Were Performed Efficiently Due to Adequate\nPlanning and Resource Coordination\n\nDetailed data validation plans were used to help ensure that test activities remain\non track\nThe CADE 2 Database Implementation Data Validation Plan contains detailed information about\nthe people, processes, and tools that will be leveraged to execute data validation and identify data\nanomalies in the Systems Acceptability Test environment and the Production Support\nEnvironment. To supplement the CADE 2 Database Implementation Data Validation Plan, the\nCADE 2 Program Management Office (PMO) also developed a Data Validation Execution Plan\nto facilitate the periodic meetings held to discuss the status of the data validation activities. The\nData Validation Execution Plan included activities to be completed for each cycle of tests.\nExamples of activities include selecting the data samples for validation, executing the automated\ndata compare tool, analyzing the data validation results reports, preparing problem tickets to\ncorrect defects, and assigning the problem tickets to the proper organization for resolution.\n\nAdequate planning and resource coordination were achieved despite the\nGovernment shutdown and limited resources\nThe CADE 2 PMO adequately planned and coordinated the data validation testing schedule and\nprocess. Planning was accomplished despite the Government shutdown, limited testing support,\nand a limited testing environment during the November to December 2013 testing period.\nAccommodations were made to shift testing efforts from the Final Integration Testing\nenvironment to the Production Support Environment and to extend testing dates further into\nCalendar Year 2014. All this required a great deal of coordination among the IT and business\nunit organizations. Testing implementation procedures were also defined and coordinated\namong all involved parties.\nIn addition, periodic checkpoint meetings were effectively used to identify, keep all partners\ninformed of, and resolve an issue with using the Field Identifier (FLID) Compare Tool (High\nVolume) (hereafter referred to as the FLID Compare Tool) in the Production Support\nEnvironment. The data validation activities for Final Integration Testing were completed on\nschedule in January 2014, and the data validation activities in 2014 continue to meet the target\ncompletion dates.\n\n\n\n                                                                                              Page 4\n\x0c                      Customer Account Data Engine 2 Database Validation\n                      Is Progressing; However, Data Coverage, Data Defect\n                        Reporting, and Documentation Need Improvement\n\n\n\nData defects were effectively managed through the Knowledge, Incident/Problem,\nService Asset Management (KISAM) system\nData defects identified through both automated and manual means were effectively managed\nthrough the KISAM system. Testers generated KISAM tickets when they found data\ndiscrepancies not previously identified. Triage teams then analyzed the tickets and assigned\nthem to the appropriate groups for resolution. IRS procedures require that testers verify\ncorrections before closing KISAM tickets. The CADE 2 PMO monitored the list of KISAM\ntickets generated during data validation.\n\nMost of the data correction tools were successfully developed and deployed to\nenable database data defect corrections\nThe IRS developed three new tools to correct CADE 2 database data defects caused by loading\nerrors, the receipt of bad data from the IMF, or software/hardware failures during daily update\nruns.\n   \xef\x82\xb7   The Update in Place tool executes direct updates to data on the CADE 2 database through\n       the use of Structured Query Language update statements.\n   \xef\x82\xb7   The Account Deleter/Re-Extractor tool makes corrections by deleting erroneous data\n       from the database, reextracting it from the IMF, and loading the corrected data into the\n       database.\n   \xef\x82\xb7   The Taxpayer Identification Number Bypass Tool is used in conjunction with the\n       Account Deleter/Re-Extractor tool. It allows daily update processing to proceed while\n       temporarily bypassing updates for specific CADE 2 database accounts with known data\n       problems until the problems can be corrected.\nThese tools were sufficiently tested through the combined efforts of the Enterprise Services\nEnterprise Systems Testing and the Applications Development organizations (both a part of the\nIT organization) and were successfully deployed into production in Calendar Year 2014. The\nlast data correction tool, the FLID Specific Update Tool, is scheduled for deployment on\nJune 27, 2014.\n\nThe CADE 2 Program Management Office Identified the Data Fields to\nBe Verified and How Each Would Be Validated; However, All Data\nFields Are Not Being Periodically Validated\nThe Government Accountability Office\xe2\x80\x99s Standards for Internal Control in the Federal\nGovernment state that control activities include verifications and accurate and timely recording\n\n\n\n\n                                                                                           Page 5\n\x0c                        Customer Account Data Engine 2 Database Validation\n                        Is Progressing; However, Data Coverage, Data Defect\n                          Reporting, and Documentation Need Improvement\n\n\n\n    of transactions and events.4 Transactions should be promptly recorded to maintain their\n    relevance and value to management in controlling operations and making decisions.\n    According to information technology industry standards, data quality assurance can be achieved\n    only when the following criteria are met:\n      \xef\x82\xb7   Accuracy: Data must be correct and consistent.\n      \xef\x82\xb7   Completeness: All related data must be linked from all possible sources.\n      \xef\x82\xb7   Availability: Data must be available upon demand.\n      \xef\x82\xb7   Timeliness: Current data must be available.\nData quality for the CADE 2 database is dependent on the database matching corresponding IMF\ndata. The CADE 2 Database Implementation Data Validation Plan for 2013/2014 documents the\nactivities that need to be performed in order to validate the CADE 2 database. This encompasses\nvalidation of all CADE 2 data fields that are derived from the IMF. In addition, data quality\nensures that the CADE 2 data records match the corresponding data records from the IMF. This\nencompasses validation of all data fields that are fed downstream from the IMF currently and\nthat will be fed to downstream systems by the CADE 2 database.\nFor the 2014 database format, the CADE 2 PMO prepared a data coverage matrix that identified\n1,018 verifiable IMF data fields that would be validated. Figure 1 provides the distribution of\nthe validation methods.\n\n\n\n\n4\n Government Accountability Office (formerly known as the General Accounting Office), GAO/AIMD-00-21.3.1,\nInternal Control: Standards for Internal Control in the Federal Government (Nov. 1999).\n                                                                                                   Page 6\n\x0c                      Customer Account Data Engine 2 Database Validation\n                      Is Progressing; However, Data Coverage, Data Defect\n                        Reporting, and Documentation Need Improvement\n\n\n\n                 Figure 1: Data Fields Grouped by Validation Methods\n\n                                        2014 Data Field Count\n\n                Number of Fields to Be Validated                                1,018\n                  Fields Validated by FLIDs                                     (911)\n                  Fields Validated by Other Methods                                 107\n\n\n                Other Validation Method Details\n                  No Need to Validate                                                3\n                  Database Integrity Check                             20\n                  Systems Acceptability Testing Cases                  41\n                  General Transcript Report Test                        2\n                  Manual Compare                                       41\n                  Total Fields Validated by Other Methods                           104\n                                                            Total                   107\n               Source: CADE 2 Database Data Field Coverage v2.4.2 11222013_Final.\n               Figures in parentheses are negative (subtractions).\n\nThe FLID Compare Tool will validate 911 data fields that will be fed to downstream systems.\nThe Data Quality Scorecard metrics used to monitor and report the status of data validation\nefforts will focus on only the data fields fed to downstream systems. Therefore, there will be no\nstatus reporting on the remaining 107 data fields.\nWe requested test documentation for each category to review the validation of the 104 data fields\nneeding validation (three fields required no validation; see Figure 1). While the test\ndocumentation was not readily available, by May 9, 2014, we received sufficient testing\ndocumentation for 100 of the 104 data fields supporting that the data fields were initially\nvalidated.\nIn addition, the CADE 2 PMO determined how often the data fields derived from the IMF will\nbe validated during production. The data validation execution schedule dated May 8, 2014,\ndetails data validation activities planned for production cycles 5 through 22. The data validation\nactivities are concentrated on the data fields that will be fed to downstream systems. While we\nobtained test documentation supporting the initial validation of 100 of 104 data fields currently\nnot fed to downstream systems, all 107 data fields not validated by the FLID Compare Tool are\nderived from the IMF; therefore, they should be periodically validated if the CADE 2 database is\nto become the authoritative source of data.\n\n\n                                                                                           Page 7\n\x0c                      Customer Account Data Engine 2 Database Validation\n                      Is Progressing; However, Data Coverage, Data Defect\n                        Reporting, and Documentation Need Improvement\n\n\n\nWithout periodically validating all data derived from the IMF and maintaining adequate\ndocumentation of the validation results, management will not have full assurance that the data\nare complete and accurate.\nOn April 29, 2014, the CADE 2 Executive Steering Committee approved a proposal to close the\nTransition State 1.5 Data Assurance exit condition by June 27, 2014, after testing transmission of\ndata to selected downstream systems. However, a Data Quality Scorecard reported that as of\nJune 27, 2014, there were five open Priority 2 data defect tickets. Three of the five were from\nthe data validation activities that were recently completed on June 27, 2014. Therefore, the exit\ncondition that requires verification of a statistically sound sample (911 data fields against\n270 million taxpayer accounts) of data in the CADE 2 database with no Priority 1 or 2 data\ndefect tickets was not successfully met. We believe that Transition State 1.5 should not be\nclosed until several consecutive cycles of data validation results show that no Priority 1 or 2 data\ndefect tickets remain open. The IRS indicated that data validation is a dynamic process and\nwhen reviewing problem tickets the nature of the ticket needs to be considered. In this case, the\nopen tickets were of low impact and minimal risk.\nThe IRS closed the Data Assurance exit condition on June 17, 2014. With this closure, IRS\nmanagement indicated acceptance of the risk of data defects occurring as data validation\nproceeds through the remainder of the processing year.\n\nRecommendations\nThe Chief Technology Officer should:\nRecommendation 1: For data fields not validated through automated data compare tools,\nensure that data validation test results are maintained and available.\n     Management\xe2\x80\x99s Response: The IRS agreed with this recommendation and asserts that\n     processes are in place. These test results are an integral part of maintaining transparency\n     with CADE 2 stakeholders and delivery partners. The business organization data\n     validation results and testing results are maintained based on the organization\xe2\x80\x99s official\n     procedures. The IRS affirms that it will continue to maintain results for manual data\n     validation activities in accordance with standard procedures, on an ongoing basis.\nRecommendation 2: Ensure that data validation plans include periodically validating the data\nfields that are not validated with automated data compare tools.\n     Management\xe2\x80\x99s Response: The IRS agreed with this recommendation. Any changes\n     to the data fields that are not validated with automated data compare tools, such as annual\n     filing season updates, will be validated through standard testing procedures. The IRS has\n     updated the data validation plan to reflect the frequency and process of manually validating\n     data fields not fed to downstream systems.\n\n\n                                                                                             Page 8\n\x0c                     Customer Account Data Engine 2 Database Validation\n                     Is Progressing; However, Data Coverage, Data Defect\n                       Reporting, and Documentation Need Improvement\n\n\n\nThe Data Sampling Methodology for Validating CADE 2 Data Is Sound;\nHowever, Key Processes in the Implementation of the Methodology\nNeed to Be Documented\nThe Government Accountability Office\xe2\x80\x99s Standards for Internal Control in the Federal\nGovernment state that control activities include verifications and accurate and timely recording\nof transactions and events. Transactions should be promptly recorded to maintain their relevance\nand value to management in controlling operations and making decisions. According to industry\nstandards, data quality assurance can be achieved only when the following criteria are met:\n1) accuracy; 2) completeness; 3) availability; and 4) timeliness.\nThe CADE 2 PMO developed a data sampling methodology to identify datasets (random and\nSmart samples) to cover all transformation logic paths and define appropriate Taxpayer\nIdentification Numbers and modules for each validation method. Implementation of this\nmethodology is ongoing and being refined.\nThe data sampling methodology was used throughout Systems Acceptability Testing and Final\nIntegration Testing of the 2013 and 2014 version of the data and continues to be used for\nproduction validation in order to maximize coverage of data transformation logic between the\nIMF and the CADE 2 database. Figure 2 illustrates the data flow and transformation process\nbetween the IMF and the CADE 2 database and from the CADE 2 database to downstream\nsystems. The methodology identifies the probability of certain transformation logic paths\noccurring and pinpoints specific Taxpayer Identification Numbers that can be used for data\nvalidation that meet specific business conditions.\n\n\n\n\n                                                                                         Page 9\n\x0c                         Customer Account Data Engine 2 Database Validation\n                         Is Progressing; However, Data Coverage, Data Defect\n                           Reporting, and Documentation Need Improvement\n\n\n\n                 Figure 2: The CADE 2 Database Corporate Files Online/\n                   IMF Online/Data Access Service Interface Data Flow\n\n\n\n\nSource: TIGTA, Ref. No. 2013-20-125, Customer Account Data Engine 2 Database Deployment Is Experiencing\nDelays and Increased Costs p. 8 (Sept. 2013), and a presentation for the CADE 2 Executive Steering Committee\nMeeting held on April 29, 2014, slide 16. VSAM \xe2\x80\x93 Virtual Storage Access Method. CFOL \xe2\x80\x93 Corporate Files\nOnline. IMFOL \xe2\x80\x93 Individual Master File Online.\n\nThe data sampling process to maximize coverage of transformation logic during data validation\nexecution consists of the following activities:\n   \xef\x82\xb7     Database Profiling identifies all of the data fields and transformation logic paths that\n         can be tested as well as the probability of each transformation occurring in the data for\n         that processing cycle \xe2\x80\x93 Because some business transactions occur infrequently or are\n         unique, production data may not be available to validate those transformation rules until\n         later in the processing year. Figure 3 outlines the high-level approach to the\n         data sampling methodology, which will provide test cases as inputs to the\n         Automated Compare Data Validation tool. Transformation logic paths that have a\n         20 percent or greater probability of occurring in the data will be included in a random\n         sample; those with less than 20 percent probability will be included in a Smart sample.\n\n\n\n\n                                                                                                      Page 10\n\x0c                         Customer Account Data Engine 2 Database Validation\n                         Is Progressing; However, Data Coverage, Data Defect\n                           Reporting, and Documentation Need Improvement\n\n\n\n               Figure 3: Data Sampling Methodology \xe2\x80\x93 High-Level Approach\n\n\n\n\nSource: CADE 2 Database Implementation Data Validation Plan, Version 2.0, p. 34, dated February 3, 2014.\nTIN \xe2\x80\x93 Taxpayer Identification Number. EST \xe2\x80\x93 Enterprise Systems Testing. SAT \xe2\x80\x93 Systems Acceptability Testing.\nFIT \xe2\x80\x93 Final Integration Testing. IMFOL \xe2\x80\x93 Individual Master File Online.\n\n        Figure 4 provides the data sampling methodology that applies a statistical approach to\n        determine the validation confidence. It determines the probability of each transformation\n        logic path occurring through Database Profiling. For example, a business event with at\n        least a 20 percent probability of occurring must occur 25 times to achieve a confidence\n        level of 99.6222 percent.\n\n\n\n\n                                                                                                      Page 11\n\x0c                         Customer Account Data Engine 2 Database Validation\n                         Is Progressing; However, Data Coverage, Data Defect\n                           Reporting, and Documentation Need Improvement\n\n\n\n               Figure 4: Data Sampling Methodology \xe2\x80\x93 Statistical Approach\n\n\n\n\nSource: Data Integrity Validation Smart Sampling Deep Dive Draft, dated April 18, 2013.\n\n   \xef\x82\xb7     Taxpayer Identification Numbers/Module Generation includes identifying specific data\n         (Taxpayer Accounts or Tax and Entity Modules) that can be tested by the data\n         validation tools, which cover specific business conditions (that are unlikely to occur in a\n         random sample of data) \xe2\x80\x93 We met several times with the Smart Sampling subject matter\n         expert to discuss how this activity and the data profiling activities were performed. We\n         were provided a spreadsheet that contained information such as transaction codes and\n         the profiling analysis used for identifying the data and business conditions that can be\n         tested. However, neither the identification process nor an explanation of the spreadsheet\n         data was documented. Thus, we were unable to evaluate the process. The CADE 2\n         PMO stated it had not yet documented the processes because executing data quality\n         activities (e.g., preparing random and Smart samples in time for data validation) had\n         priority over the documentation.\n    \xef\x82\xb7   Data Validation Execution includes testing the sampled Taxpayer Identification\n        Numbers/Modules using the identified data validation methods \xe2\x80\x93 The Data Validation\n        Execution Plans and FLID Compare reports show that random and Smart samples were\n        used in the data validation tests. Validation of completeness is reported on the Data\n        Quality Scorecard under the Data Coverage Section. This section was first populated for\n\n\n                                                                                            Page 12\n\x0c                      Customer Account Data Engine 2 Database Validation\n                      Is Progressing; However, Data Coverage, Data Defect\n                        Reporting, and Documentation Need Improvement\n\n\n\n       Production cycles 5 and 6, which reported on only the percentage of transformation logic\n       paths covered. The methodology for validating completeness had not been documented.\n       On March 11, 2014, a Fast Smart sampling process was tested in cycle 5. It reuses the\n       regular Smart sampling process but can be applied to production on a weekly basis, while\n       the regular Smart sampling process requires at least four weeks. The results indicate that\n       the Fast Smart sampling method added five times more coverage than the regular random\n       sampling method and helped to identify new defects. As a result, it was officially\n       implemented for cycles 9 and 10, in addition to using random sampling. We received\n       two results spreadsheets that summarized the results used to conclude that Fast Smart\n       sampling provided more coverage with fewer cases. We received seven of the eight\n       source documents to support the summary spreadsheets; therefore, we were unable to\n       completely confirm the numbers.\n   \xef\x82\xb7   Reporting and Analysis \xe2\x80\x93 The following activities are associated with this step:\n       a. Analyzes the transformation and data field coverage provided by data sampling and\n          reports out results. Transformation Logic Paths coverage and data field coverage\n          were included on the Data Quality Scorecard beginning with cycles 5/6 and 9/10,\n          respectively.\n       b. Validates the completeness of data profiling activities. We have not seen any\n          documentation on the status of this activity.\nOur statistician determined that the concept and process of using the data sampling methodology\nto ensure that infrequently used data fields will be included in data validation testing and to\nprovide a statistical basis for deciding how many instances of a particular data field or business\nevent are to be sampled, based on the probability of occurrence and target confidence level, is\nsound. While the process used to implement the data sampling methodology was verbally\ndescribed by IRS personnel in meetings, these processes had not been documented and were not\navailable for review.\nIn addition, the process for measuring the effectiveness and success of the data sampling\nmethodology in providing the expected coverage had not been documented. For example, the\nprocess for determining the percentage of transformation logic paths covered was not\ndocumented. This information is needed to ensure that the percentage of transformation logic\npaths, FLIDs, and data fields covered are accurately identified for the Data Quality Scorecard.\nDue to the significant time pressure and limited resources faced by the CADE 2 PMO to ensure\nthat the CADE 2 data validation activities stay on course, conducting the data sampling activities\nhad priority over fully documenting the processes for profiling the data and evaluating the\neffectiveness of the data sampling methodology. In addition, the CADE 2 PMO explained that\nalthough the methodology has been implemented, they are still in the process of refining it.\n\n\n                                                                                          Page 13\n\x0c                     Customer Account Data Engine 2 Database Validation\n                     Is Progressing; However, Data Coverage, Data Defect\n                       Reporting, and Documentation Need Improvement\n\n\n\nUntil data validation processes are formally documented, IRS management cannot have full\nconfidence that the correct data validation procedures are performed consistently. This may also\nreduce the assurance that CADE 2 data are effectively and completely tested. These processes\nshould be documented as soon as possible to avoid the risk of losing the knowledge that only the\nsubject matter experts have and to provide a reference for current and future use.\nManagement Action: After discussing the need to document the data sampling methodology\nwith CADE 2 PMO management, they recognized the urgency of the need and began\ndevelopment of the documentation. Several in-progress documents were provided for our\nreview, including the Defect Verification Process used by Smart Sampling and the CADE 2 Data\nValidation Smart Sample Process Overview documents.\n\nRecommendation\nRecommendation 3: The Chief Technology Officer should ensure that all data sampling\nmethodology processes such as data profiling and calculating data field and transformation logic\ncoverage are completely documented and that the documents are readily available for review.\nWhere applicable, the documentation should include procedures to collect and maintain source\ndata used to support data validation metrics.\n     Management\xe2\x80\x99s Response: The IRS agreed with this recommendation. The IRS is\n     developing documentation on the procedures to collect and maintain source data used to\n     support data validation metrics.\n\nThe Documentation and Processes for Determining the Data Quality\nScorecard Metrics Need Improvement\nThe Government Accountability Office\xe2\x80\x99s Standards for Internal Control in the Federal\nGovernment state that control activities include verifications and accurate and timely recording\nof transactions and events. Transactions should be promptly recorded to maintain their relevance\nand value to management in controlling operations and making decisions.\nAccording to the Data Quality Team Charter v 0.3 dated July 26, 2013, the team\xe2\x80\x99s mission is to\nensure the quality and integrity of the data within the CADE 2 database and the data fed to\ndownstream systems by providing execution support for defect management activities and\nestablishing a comprehensive Data Quality Scorecard to measure the progress towards data\nquality goals.\nThe Data Quality Team developed a Data Quality Scorecard that includes six key performance\nareas with success criteria: 1) Data Coverage; 2) Sample Size; 3) Data Validation Defect\nSummary; 4) Referential Integrity Checks; 5) Balance and Control Mechanisms Plus Aggregate\nMetrics; and 6) Data Correction Tool Status. Figure 5 provides the defined key performance\n\n\n                                                                                         Page 14\n\x0c                        Customer Account Data Engine 2 Database Validation\n                        Is Progressing; However, Data Coverage, Data Defect\n                          Reporting, and Documentation Need Improvement\n\n\n\nindicators (KPI) and success criteria for each area. The KPIs that are grayed were not included\nin the initial Data Quality Scorecard because the information was not available.\n                Figure 5: Key Performance Indicators and Success Criteria\n\n\n\n\nSource: CADE 2 Data Quality Scorecard for the 2014 Version of the Data as of December 16, 2013. TBD \xe2\x80\x93 To Be\nDetermined. P1, P2 \xe2\x80\x93 Priority 1 or 2.\n\nThe first published Data Quality Scorecard, dated December 16, 2013, reported on\npre-production data and was distributed to stakeholders on December 20, 2013. The Scorecard is\npresented in Appendix IV, Figure 1. The IRS initially planned to prepare a Scorecard every\ntwo weeks for distribution to stakeholders. On March 21, 2014, we received information that a\nScorecard will be produced for each data validation cycle.\nWe attempted to fully assess the accuracy of the entire Data Quality Scorecard for a specific\ncycle. However, due to the lack of supporting source documentation we were unable to\ncomplete the assessment. Alternatively, we validated the individual sections of the Scorecard\nwhen sufficient source information was made available.\nThe results of our review follow:\nSection 1 \xe2\x80\x93 Data Coverage: This section includes the Transactions/Business Events, the Logic\nPaths, and the Data Fields and FLIDs covered. The IRS relies on summary spreadsheets to\n                                                                                                   Page 15\n\x0c                      Customer Account Data Engine 2 Database Validation\n                      Is Progressing; However, Data Coverage, Data Defect\n                        Reporting, and Documentation Need Improvement\n\n\n\nreport the data validation results for the first three KPIs. The Data Quality Scorecard for cycles\n9/10 as of April 14, 2014, reported metrics for Logic Paths, Business Events, Data Fields, and\nFLIDs. We received summary spreadsheets for the first three metrics. We also received source\ndocumentation supporting the summary spreadsheet for the Logic Paths KPI but not for the\nBusiness Events and the Data Fields KPIs. Although the Scorecard reported 80 \xe2\x80\x93 90 percent\ncoverage of the FLIDs, we did not receive any documentation to support that metric. Figure 6\ndisplays the Data Coverage portion of the Data Quality Scorecards.\n                                        Figure 6: Data Coverage\n\n\n\n\n Source: Excerpts of the Data Quality Scorecards provided by the CADE 2 PMO. Pre-PROD \xe2\x80\x93 Pre-production.\n i5 \xe2\x80\x93 Iteration 5.\n\n    Section 2 \xe2\x80\x93 Sample Size: This section includes the targeted number of Taxpayer\n    Identification Numbers and/or modules expected to be compared and the actual number of\n    Taxpayer Identification Numbers and/or modules compared for data validations performed\n    prior to production cycles 5/6. Beginning with production cycles 5/6, the objective was to\n    compare and report on modules. The source for the number of actual modules compared\n    during production should have been documented in an FLID report. Until the end of\n    April 2014, the number of actual modules compared was incorrectly reported because the\n    IRS did not base the numbers on the FLID report. Instead, they used the targeted volumes\n\n                                                                                                Page 16\n\x0c                      Customer Account Data Engine 2 Database Validation\n                      Is Progressing; However, Data Coverage, Data Defect\n                        Reporting, and Documentation Need Improvement\n\n\n\n    for the random and Smart samples as the basis for reporting the actual modules compared.\n    The IRS was not referring to the FLID reports for the actual number of modules compared\n    because the FLID reports did not clearly indicate the actual number of modules compared.\n    In addition, the process for determining the actual number was not documented. Also, the\n    Wage and Investment Division Business Modernization Office (hereafter referred to as\n    Business Modernization Office) stated that they were in the process of learning how to read\n    the FLID reports and verify the contents. As a result, the incorrect numbers were included\n    in presentations submitted to CADE 2 executives and the Chief Technology Officer for their\n    discussions.\n    Figure 7 shows the incorrect and correct number of modules actually compared. The\n    Business Modernization Office personnel stated that after learning more about the data\n    captured in the FLID report (how to read them and verify the contents), they updated the\n    Scorecards from cycles 5/6 through the present accordingly to accurately reflect the actual\n    number of modules compared. Prior to that, the numbers were based on the targeted\n    volumes for the random and Smart samples. It appears that the IRS learned of the need to\n    make the corrections after our repeated requests for documented source information.\n                      Figure 7 \xe2\x80\x93 The Incorrect and Correct Number\n                      of Actual Modules Compared As Reported on\n                     Various Iterations of the Data Quality Scorecard\n\n                                         Actual Modules Compared\n            Cycles          Incorrect Number                    Correct Number\n              5/6            500,000/500,000                     590,229/588,630\n                             577,794 / 576,618\n              7/8             591,302/589,264                    591,652/590,308\n             9/10             623,372/500,000                    500,042/611,374\n          Source: Data Quality Scorecards provided by the CADE 2 PMO.\n\nSection 3 \xe2\x80\x93 Data Validation Defect Summary: This section reports the number of new data\ndefect tickets open and, of those, the number that remain open for that cycle as of the Scorecard\ndate. It does not report the cumulative number of open unresolved tickets from other cycles as of\nthat date. For example, the Data Quality Scorecard for cycles 15/16 as of May 12, 2014,\nreported that all of the new tickets opened during that time were closed because they were later\ndetermined not to be data issues. Because the Scorecard showed no open tickets, it might appear\nthat all of the data are correct. However, this is not the case because the Scorecard does not\ncarry over the unresolved data defect tickets from prior cycles that remain in open status. For\nthis reporting period, another management report shows seven open data defect tickets. All were\nestimated to be resolved and closed by May 28, 2014. IRS management indicated that initial\n\n                                                                                         Page 17\n\x0c                     Customer Account Data Engine 2 Database Validation\n                     Is Progressing; However, Data Coverage, Data Defect\n                       Reporting, and Documentation Need Improvement\n\n\n\nScorecards did not report cumulative open unresolved data defect tickets because each Scorecard\ncovered only a two week period. As of May 19, 2014, the IRS began producing an Aggregate\nScorecard that includes all open data defect tickets.\nAlso, as of April 3, 2014, there are 10 open known data defects on the Known Defect List.\nThese are data defects that have occurred on more than one occasion and need to be corrected.\nThese, along with the new data defects that are identified during the data validation process,\nmust be corrected before the CADE 2 database can replace the current IMF account settlement\nsystem with a relational database processing system and become a key component in the IRS\xe2\x80\x99s\nenterprise-wide, data-centric information technology strategy.\nAlthough the information is available, the Data Quality Scorecard does not show the impact of\nthe data defects. For example, the Scorecard does not show the number of tax and/or entity\nmodules or taxpayers affected. When resources are limited, knowing the impact of the data\ndefects could help prioritize the order in which data defects are resolved.\nWe also found a discrepancy between the Data Quality Scorecard for cycles 15/16 dated\nMay 12, 2014, and the CADE 2 Data Implementation Health Report dated May 19, 2014\n(hereafter referred to as the Health Report). The Scorecard, which was also embedded in the\nHealth Report, reported \xe2\x80\x9cEight new data defect tickets were initially opened for cycle 15/16\nproduction Data Validation, but after further analysis, these tickets were determined to not be\ndata issues and were closed.\xe2\x80\x9d The Health Report reported that eight data defect tickets opened as\na result of cycles 15 and 16 data validation; however, seven of them were deemed to be \xe2\x80\x9cno\ntrouble found.\xe2\x80\x9d The remaining ticket was scheduled to be closed upon the delivery of FLID\nCompare Tool Iteration 6 in early June 2014.\nThe Data Quality Scorecard for Production Cycles 5/6 dated March 12, 2014, correctly reported\nthat 12 new data defects were open and one of the 12 was subsequently closed. However, we\nfound two discrepancies in this section. The first is in the bar graph, which shows eight open\ntickets for cycle 5 and three for cycle 6. The spreadsheet with the source information shows\nseven open tickets for cycle 5 and four for cycle 6.\nThe second discrepancy is with the percentages in the pie chart. The chart shows that 41 percent\nand 17 percent of the Defect Origin/Source were from Solutions Engineering\xe2\x80\x93Data Engineering\nand Identify and Extract Account Changes, respectively. However, based on the source\nspreadsheet, Solutions Engineering\xe2\x80\x93Data Engineering had six (50 percent) of the 12 and Identify\nand Extract Account Changes had one (8 percent) of the 12.\nSection 4 \xe2\x80\x93 Referential Integrity Checks: Referential Integrity Checks are run against the\ndatabase to ensure that tax account information that is spread over many tables can be\nreassembled into a coherent tax account (i.e., prevent orphan data in the database). Identified\nissues should be resolved according to standard operating procedures. As of April 24, 2014, all\nData Quality Scorecards reported that all checks for the cycles passed. We obtained and\nreviewed 14 source reports for cycles 201250 through 201310 but none corresponded to the Data\n\n                                                                                         Page 18\n\x0c                      Customer Account Data Engine 2 Database Validation\n                      Is Progressing; However, Data Coverage, Data Defect\n                        Reporting, and Documentation Need Improvement\n\n\n\nQuality Scorecards we received. Therefore, we were unable to confirm that all Referential\nIntegrity checks passed.\nSection 5 \xe2\x80\x93 Balance and Control Mechanisms + Aggregated Metrics: This section reports\nresults from two sources:\n   \xef\x82\xb7   Simplified Financial Balance Reports: These are financial integrity checks to ensure that\n       amounts from the IMF equal the CADE 2 database amounts. Chief Financial Officer\n       requirements include balancing the sum of certain financial fields. Specialized financial\n       reports are generated and provided to the Chief Financial Officer for manual comparison\n       and verification. For the Data Quality Scorecard for Cycles 5/6 as of March 12, 2014, we\n       received and compared the nine IMF reports to the nine CADE 2 database reports and\n       found that all nine balanced to the penny.\n   \xef\x82\xb7   CADE 2/IMF Analytical Report Business Objects Enterprise Comparisons: This activity\n       validates that CADE 2 data match IMF data by comparing data from certain IMF and\n       CADE 2 database reports. As planned, these metrics were first reported on the Data\n       Quality Scorecard for cycles 9/10. The April 14, 2014, version shows that the data fields\n       in nine of the 10 reports matched. The remaining report has an 87 percent match rate, but\n       the CADE 2 PMO is expecting results from another test report. We received a summary\n       report that supported all the data in the Business Objects Enterprise Report Execution\n       Analysis section except for the data in the CADE Fields Used column. However, we did\n       not receive documents supporting the statistics in the summary report.\nSection 6 \xe2\x80\x93 Data Correction Tools: We received documentation which confirms that six of the\nseven tools were implemented into production. Therefore, this section correctly reported the\nstatus of the tools.\nBecause the IT organization and the Business Modernization Office worked together to develop\nthe Scorecard and the KPIs, the Scorecard should meet the stakeholders\xe2\x80\x99 needs. In addition, the\nprocesses used to ascertain the actual statistical data need to be documented to ensure that they\nare correctly and accurately determined. This will help stakeholders fully understand what the\nstatistics represent if they request an explanation for the basis of the statistics. When processes\nare not sufficiently documented, there is a risk that they are not correctly performed. For\nexample, because the FLID report does not clearly state the total number of actual modules\ncompared and there were no documented instructions for identifying this, the number of actual\nmodules compared were incorrectly determined and incorrectly reported on the Scorecards\nthrough April 2014 and incorrectly reported in presentations to management.\nManagement Action: After meeting with the CADE 2 PMO regarding the lack of sufficient\nsupporting documentation needed to validate the metrics on the Data Quality Scorecard, it began\ncollecting and providing us with the documentation. For example and as stated above, we\nreceived source documentation that confirmed the logic path KPI metric.\n\n\n                                                                                            Page 19\n\x0c                      Customer Account Data Engine 2 Database Validation\n                      Is Progressing; However, Data Coverage, Data Defect\n                        Reporting, and Documentation Need Improvement\n\n\n\nRecommendations\nThe Chief Technology Officer should:\nRecommendation 4: Ensure that all processes for determining the metrics needed to populate\nthe Data Quality Scorecard are completely documented and that the documents are readily\navailable for review.\n     Management\xe2\x80\x99s Response: The IRS agreed with this recommendation. The IRS has\n     developed and will be publishing documentation of the Scorecard development process.\n     The IRS will continue to update, maintain, and develop documentation around the Data\n     Quality Scorecard to ensure that its inputs and processes are transparent to CADE 2\n     stakeholders.\nRecommendation 5: Ensure that all documentation needed to verify the data in the Data\nQuality Scorecard is stored for future reference and to provide the information needed for\noversight activities, such as spot checks to confirm the accuracy of the Scorecard.\n     Management\xe2\x80\x99s Response: The IRS agreed with this recommendation. The IRS has\n     documented procedures for developing the Scorecard, a checklist to verify the contents,\n     and has begun storing all Scorecard sources in a SharePoint repository. The IRS will\n     ensure that the repository remains organized and easily accessible.\n\nThe Field Identifier Compare Tool Validates Data for Downstream\nSystems, but Data Discrepancy Reports Need Improvement\nThe IRS data strategy requires that data fields be uniquely and consistently identified across\nsystems. The validation of data on the CADE 2 database is critical to the database becoming a\ntrusted source of data for downstream systems and ultimately the file of record for IMF data.\nThe FLID Compare Tool was developed as an automated way to compare high volumes of IMF\ndata to CADE 2 data during the data validation process. It was the main tool used for automated\ndata validation during the 2014 Filing Season. The tool leverages the existing IRS process of\nusing field identifiers (i.e., FLIDs) to help identify IMF data. Currently, Corporate Files Online\nprocessing builds FLIDs for IMF data from the IMF Virtual Storage Access Method files. The\nnew CADE 2 Data Access Service builds these same FLIDs for data from the CADE 2 database.\nThe FLID Compare Tool compares FLIDs from both sources and identifies any discrepancies in\ntheir data values.\nCurrent IMF processing sends IMF data to downstream systems in files using FLIDs. By\ncomparing FLIDs built from IMF Virtual Storage Access Method files to FLIDs built from the\nCADE 2 database, the FLID Compare Tool can cover all the data consumed by downstream\nsystems. Therefore, 911 (89 percent) of the 1,018 verifiable data fields on the CADE 2 database\ncan be identified through the use of FLIDs; the remaining 107 (11 percent) of the data populated\n                                                                                          Page 20\n\x0c                          Customer Account Data Engine 2 Database Validation\n                          Is Progressing; However, Data Coverage, Data Defect\n                            Reporting, and Documentation Need Improvement\n\n\n\ninto the CADE 2 database from the IMF are not related to an FLID number. Other validation\nmethods are used to ensure coverage of the data fields not covered by the FLID Compare Tool.\n(This information is summarized in Figure 1 of this report.)\nThe FLID Compare Tool produces several reports on the results of its comparisons. One of\nthem, the Discrepancy Detail Report, lists all data discrepancies by FLID number, FLID name,\nIMF data field name, and CADE 2 database table and column. The Business Modernization\nOffice used this report to review and analyze details on data discrepancies found during the data\nvalidation process.\nThe FLID Coverage Count Report, added for the 2014 Filing Season, provides metrics on FLID\ncoverage during execution of the FLID Compare Tool. It provides a complete list of all unique\nFLID numbers, whether or not the FLID was compared, and the match/no-match count for each\ncompared FLID.\nThe Enterprise Data Management Office (EDMO) maintains the list of FLIDs. We compared\nthe EDMO FLID list to the one in the FLID Coverage Count Report and found discrepancies.\n      \xef\x82\xb7   10 FLID numbers on the EDMO list were missing from the FLID Coverage Count\n          Report.\n      \xef\x82\xb7   23 FLID numbers on the FLID Coverage Count Report did not have FLID names.\n      \xef\x82\xb7   36 FLID numbers in the FLID Coverage Count Report were listed as \xe2\x80\x9creserved,\xe2\x80\x9d\n          compared to 37 in the EDMO list.\nThese discrepancies raise questions as to whether the FLID Compare Tool is accurately\ncomparing all data at the FLID level.\nAfter we alerted the IRS to the 10 missing FLID numbers, the IRS researched the issue and\nfound that the missing FLIDs should have been included in the FLID Coverage Count Report\nand compared by the FLID Compare Tool. The IRS plans to add the missing FLIDs to the next\niteration of the FLID Compare Tool scheduled for implementation in the summer of 2014. In the\ninterim, the IRS is using another automated tool to review the 10 missing FLIDs.\nWhile FLID numbers are currently used by the IMF to pass data to downstream systems,\nFLID numbers do not uniquely identify data on the IMF. They are used in conjunction with their\nposition on the IMF data record to provide uniqueness. There are 805 FLID numbers5 and\n911 FLID data fields on the CADE 2 database. This indicates that some FLID numbers are used\nmore than once for data field coverage. For example, the last name in the IMF data field\nTaxpayer Nameline is represented by FLID 0733. However, FLID 0733 is mapped to\nthree separate data fields on the CADE 2 database. Specifically:\n\n\n5\n    FLID number sequence count (842) minus reserved FLID numbers (37) = 805 FLID numbers used in 2014.\n                                                                                                    Page 21\n\x0c                       Customer Account Data Engine 2 Database Validation\n                       Is Progressing; However, Data Coverage, Data Defect\n                         Reporting, and Documentation Need Improvement\n\n\n\n   \xef\x82\xb7    Taxpayer_Nameline.Joint_Last_Nm.\n   \xef\x82\xb7    Taxpayer_Nameline.Primary_Last_Nm.\n   \xef\x82\xb7    Taxpayer_Nameline.Secondary_Last_Nm.\nThe FLID Coverage Count Report counts by unique FLID number only; it does not trace back to\nunique data fields on the database. Without this traceability, it is impossible to verify that all\ndatabase fields are validated by the FLID Compare Tool without additional analysis. After we\nraised this issue to the IRS, the IRS responded that it will explore ways to address the\none-to-many relationship of FLIDs to data fields in future iterations of the FLID Compare Tool.\nThe FLID Compare Tool is used to gather metrics for data validation reporting. The Extended\nDiscrepancy Counts Report is used to provide sample size counts for the Data Quality Scorecard;\nhowever, the report takes counts by program name, and documentation does not indicate how\nthese program names translate to sample size counts. Therefore, data in this report may be\nmisinterpreted and lead to incorrect information reported to management. In addition, if the\nFLID list used in the FLID Compare Tool does not match the FLID list maintained by the\nEDMO, the IRS cannot be assured that it is accurately and completely validating all FLIDs that\nare intended to be fed to downstream systems. Finally, if the FLID Compare Tool cannot trace\nback to the 911 data fields on the CADE 2 database that it is tasked with validating, the IRS\ncannot guarantee the accuracy or the completeness of those fields.\n\nRecommendations\nThe Chief Technology Officer should:\nRecommendation 6: Ensure that automated data compare tools identify and report on data\nfields, not FLID numbers, to align CADE 2 data validation efforts with the IRS\xe2\x80\x99s data strategy\ngoal of uniquely identifying data fields across systems.\n       Management\xe2\x80\x99s Response: The IRS disagreed with this recommendation. Data defects\n       are identified at the FLID level; the output from the FLID Compare Tool provides counts\n       by FLID number. Traceability to unique data fields is established through the use of\n       transformation rules analyzed during the defect triage process. This provides the\n       acceptable level of traceability to unique data fields. The IRS\xe2\x80\x99s data strategy goal for\n       uniquely identifying data fields across systems is considered a guiding principal; however,\n       programs are given discretion for when identifying at the data field level is necessary.\n       Office of Audit Comment: TIGTA maintains its position that CADE 2 data validation\n       efforts should identify and report on individual data fields. The IRS Data Strategy and\n       Roadmap (dated August 27, 2012) stresses that information should be consistently\n       represented across systems, available at the same level of granularity, and have summary\n       levels so that meaningful comparisons can be made. The Data Strategy does not mention\n\n                                                                                          Page 22\n\x0c                     Customer Account Data Engine 2 Database Validation\n                     Is Progressing; However, Data Coverage, Data Defect\n                       Reporting, and Documentation Need Improvement\n\n\n\n     that discretion is given to programs to determine when this principle would or would not\n     apply.\nRecommendation 7: Ensure that automated data compare tool reports clearly identify\ncounters and align with data validation metrics.\n     Management\xe2\x80\x99s Response: The IRS agreed with this recommendation. The High\n     Volume FLID Compare Tool Design Document will be updated to explain the source of\n     the numbers that are populated for those program names in Report 4, which will provide\n     the actual input record count. This will allow for accurate reporting of actual sample size\n     on the Scorecard.\n\n\n\n\n                                                                                          Page 23\n\x0c                            Customer Account Data Engine 2 Database Validation\n                            Is Progressing; However, Data Coverage, Data Defect\n                              Reporting, and Documentation Need Improvement\n\n\n\n                                                                                         Appendix I\n\n            Detailed Objective, Scope, and Methodology\n\nOur overall audit objective was to evaluate IRS efforts to ensure that the data in the CADE 2\ndatabase1 are accurate and complete. To accomplish our objective, we:\nI.         Assessed the effectiveness of the CADE 2 Data Validation methodology.\n              A. Reviewed the CADE 2 Database Implementation Data Validation Plan.\n              B. Evaluated the data sampling methodology.\nII.        Evaluated the implementation and effectiveness of automated compare tools in the\n           CADE 2 data validation process.\n              A. Reviewed documentation to determine if formal planning and resource\n                 coordination occurred for the implementation of the automated compare tools in\n                 the CADE 2 data validation process.\n              B. Interviewed subject matter experts to determine how each automated compare\n                 tool is used in the data validation process.\n              C. Reviewed testing results generated from each tool to determine the effectiveness\n                 of the tool in the data validation process.\nIII.       Evaluated the effectiveness of the CADE 2 Data Quality Team.\n              A. Reviewed the CADE 2 Data Quality Team Charter.\n              B. Determined what metrics (if any) currently exist for CADE 2 data validation\n                 activities and how these metrics are being used to measure data quality.\n              C. Evaluated KPIs developed by the team to ensure that they adequately measure\n                 CADE 2 data quality.\n              D. Evaluated the monitoring and reporting processes for KPIs.\n              E.    Evaluated the effectiveness of the data defect management process.\n\n\n\n\n1\n    See Appendix V for a glossary of terms.\n                                                                                             Page 24\n\x0c                        Customer Account Data Engine 2 Database Validation\n                        Is Progressing; However, Data Coverage, Data Defect\n                          Reporting, and Documentation Need Improvement\n\n\n\nInternal controls methodology\nInternal controls relate to management\xe2\x80\x99s plans, methods, and procedures used to meet their\nmission, goals, and objectives. Internal controls include the processes and procedures for\nplanning, organizing, directing, and controlling program operations. They include the systems\nfor measuring, reporting, and monitoring program performance. We determined that the\nfollowing internal controls were relevant to our audit objective: the Government Accountability\nOffice\xe2\x80\x99s Standards for Internal Control in the Federal Government,2 the CADE 2 Database\nImplementation Data Validation Plan, various meetings such as the CADE 2 Weekly Executive\nStatus Meetings and periodic data validation execution checkpoint meetings, design documents,\nand data validation policies and procedures. We evaluated these controls by conducting\ninterviews with IRS management and staff; attending CADE 2 meetings; and reviewing and\nevaluating documents such as the CADE 2 Data Quality Team Charter, the CADE 2 Database\nImplementation Data Validation Plan and Data Validation Execution Plans, the FLID Compare\nTool design documents, and related FLID reports.\n\n\n\n\n2\n Government Accountability Office (formerly known as the General Accounting Office), GAO/AIMD-00-21.3.1,\nInternal Control: Standards for Internal Control in the Federal Government (Nov. 1999).\n                                                                                                 Page 25\n\x0c                     Customer Account Data Engine 2 Database Validation\n                     Is Progressing; However, Data Coverage, Data Defect\n                       Reporting, and Documentation Need Improvement\n\n\n\n                                                                              Appendix II\n\n                 Major Contributors to This Report\n\nAlan R. Duncan, Assistant Inspector General for Audit (Security and Information Technology\nServices)\nDanny Verneuille, Director\nMyron Gulley, Audit Manager\nTina Wong, Lead Auditor\nRichard Borst, Senior Auditor\nArlene Feskanich, Information Technology Specialist\nErika D. Axelson, Ph.D., Statistician\n\n\n\n\n                                                                                     Page 26\n\x0c                    Customer Account Data Engine 2 Database Validation\n                    Is Progressing; However, Data Coverage, Data Defect\n                      Reporting, and Documentation Need Improvement\n\n\n\n                                                                             Appendix III\n\n                          Report Distribution List\n\nCommissioner C\nOffice of the Commissioner \xe2\x80\x93 Attn: Chief of Staff C\nDeputy Commissioner for Operations Support OS\nDeputy Commissioner for Services and Enforcement SE\nCommissioner, Wage and Investment Division SE:W\nDeputy Chief Information Officer for Operations OS:CTO\nAssociate Chief Information Officer, Applications Development OS:CTO:AD\nAssociate Chief Information Officer, Enterprise Information Technology \xe2\x80\x93 Program\nManagement Office OS:CTO:EIT\nDirector, Enterprise Systems Testing OS:CTO:AD:EST\nChief Counsel CC\nNational Taxpayer Advocate TA\nDirector, Office of Legislative Affairs CL:LA\nDirector, Office of Program Evaluation and Risk Analysis RAS:O\nOffice of Internal Control OS:CFO:CPIC:IC\nAudit Liaison: Director, Risk Management Division OS:CTO:SP:RM\n\n\n\n\n                                                                                   Page 27\n\x0c                        Customer Account Data Engine 2 Database Validation\n                        Is Progressing; However, Data Coverage, Data Defect\n                          Reporting, and Documentation Need Improvement\n\n\n\n                                                                                          Appendix IV\n\n                             Data Quality Scorecards\n                      Figure 1: First Published Data Quality Scorecard\n                             Snapshot as of December 16, 2013\n\n\n\n\nSource: CADE 2 PMO. INIT \xe2\x80\x93 Initialization. FIT \xe2\x80\x93 Final Integration Testing. PSE \xe2\x80\x93 Production Support\nEnvironment. SAT \xe2\x80\x93 Systems Acceptability Testing. RI \xe2\x80\x93 Referential Integrity. DU \xe2\x80\x93 Daily Update. TIN \xe2\x80\x93\nTaxpayer Identification Number. IBM \xe2\x80\x93 International Business Machines. IMFOL \xe2\x80\x93 Individual Master File Online.\nSCOP \xe2\x80\x93 Standard Corporate Files On Line Overnight Processing. Vol \xe2\x80\x93 Volume. P1, P2, P3 \xe2\x80\x93 Priority 1, 2, or 3.\nDAS \xe2\x80\x93 Data Access Service. IEAC \xe2\x80\x93 Identify and Extract Account Changes. INF \xe2\x80\x93 Informatica. SE-DE \xe2\x80\x93 Solutions\nEngineering \xe2\x80\x93 Date Engineering. SDLC \xe2\x80\x93 Systems Development Life Cycle. Reqs \xe2\x80\x93 Requirements.\nDev \xe2\x80\x93 Development. DIT \xe2\x80\x93 Development, Integration, and Testing. Functl \xe2\x80\x93 Functional.\n\n                                                                                                    Page 28\n\x0c                        Customer Account Data Engine 2 Database Validation\n                        Is Progressing; However, Data Coverage, Data Defect\n                          Reporting, and Documentation Need Improvement\n\n\n\n                              Figure 2: Data Quality Scorecard\n                        for Production Cycle 9/10 as of April 14, 2014\n\n\n\n\nSource: CADE 2 PMO. Pre-PROD \xe2\x80\x93 Pre-production. PROD DU \xe2\x80\x93 Production Daily Update.\nP1, P2, P3 \xe2\x80\x93 Priority 1, 2, or 3. DB \xe2\x80\x93 Database. RI \xe2\x80\x93 Referential Integrity. B&C \xe2\x80\x93 Balance and Control. BOE \xe2\x80\x93\nBusiness Objects Enterprise. ACNT \xe2\x80\x93 Accounts. IRAF \xe2\x80\x93 Individual Retirement Account File. FERDI \xe2\x80\x93 Federal\nEmployee/Retiree Delinquency Initiative. SB \xe2\x80\x93 Small Business and Self-Employed. WI \xe2\x80\x93 Wage and Investment. MS\n\xe2\x80\x93 Milestone. TIN \xe2\x80\x93 Taxpayer Identification Number.\n\n\n\n\n                                                                                                    Page 29\n\x0c                        Customer Account Data Engine 2 Database Validation\n                        Is Progressing; However, Data Coverage, Data Defect\n                          Reporting, and Documentation Need Improvement\n\n\n\n                          Figure 3: Revised Data Quality Scorecard\n                        for Production Cycle 9/10 as of April 22, 2014\n\n\n\n\nSource: CADE 2 PMO. Pre-PROD \xe2\x80\x93 Pre-production. PROD DU \xe2\x80\x93 Production Daily Update.\nP1, P2, P3 \xe2\x80\x93 Priority 1, 2, or 3. DB \xe2\x80\x93 Database. RI \xe2\x80\x93 Referential Integrity. B&C \xe2\x80\x93 Balance and Control. BOE \xe2\x80\x93\nBusiness Objects Enterprise. ACNT \xe2\x80\x93 Accounts. IRAF \xe2\x80\x93 Individual Retirement Account File. FERDI \xe2\x80\x93 Federal\nEmployee/Retiree Delinquency Initiative. SB \xe2\x80\x93 Small Business and Self-Employed. WI \xe2\x80\x93 Wage and Investment. MS\n\xe2\x80\x93 Milestone. TIN \xe2\x80\x93 Taxpayer Identification Number.\n\n\n\n\n                                                                                                    Page 30\n\x0c                        Customer Account Data Engine 2 Database Validation\n                        Is Progressing; However, Data Coverage, Data Defect\n                          Reporting, and Documentation Need Improvement\n\n\n\n                              Figure 4: Data Quality Scorecard\n                        for Production Cycle 15/16 as of May 12, 2014\n\n\n\n\nSource: CADE 2 PMO. Pre-PROD \xe2\x80\x93 Pre-production. P1, P2, P3 \xe2\x80\x93 Priority 1, 2, or 3. RI \xe2\x80\x93 Referential Integrity.\nB&C \xe2\x80\x93 Balance and Control. PROD DU \xe2\x80\x93 Production Daily Update. TIN \xe2\x80\x93 Taxpayer Identification Number.\n\n\n\n\n                                                                                                     Page 31\n\x0c                         Customer Account Data Engine 2 Database Validation\n                         Is Progressing; However, Data Coverage, Data Defect\n                           Reporting, and Documentation Need Improvement\n\n\n\n                                                                                     Appendix V\n\n                                Glossary of Terms\n\nTerm                                Definition\n\nApplications Development            The development organization for systems that manage taxpayer\n                                    accounts from the initial filing of a tax return to interactions\n                                    with the taxpayers and potential audit and collection activities.\n                                    It also provides enterprise-wide administrative systems related to\n                                    workforce support, human capital, financial, and facilities.\n\nBusiness Event                      Consists of transactions and nontransactions. A transaction is a\n                                    business event. An example of a transaction is the posting of a\n                                    tax return to the taxpayer\xe2\x80\x99s account. A nontransaction is usually\n                                    generated by a transaction. An example of a nontransaction is\n                                    the balance section of the taxpayer\xe2\x80\x99s account.\n\nCorporate Files Online              A collection of \xe2\x80\x9cread only\xe2\x80\x9d files extracted from the Master Files\n                                    and maintained at the Enterprise Computing Centers in\n                                    Memphis, Tennessee, and Martinsburg, West Virginia.\n\nCustomer Account Data Engine        The foundation for managing taxpayer accounts in the IRS\n(CADE)                              modernization plan. It will consist of databases and related\n                                    applications that will replace the existing IRS Master File\n                                    processing systems and will include applications for daily\n                                    posting, settlement, maintenance, refund processing, and issue\n                                    detection for taxpayer tax account and return data.\n\nCycle                               A week, which is usually designated by a cycle number when\n                                    referring to IRS processing activities.\n\nData Access Service                 A set of common capabilities that mediate relationships between\n                                    applications throughout the enterprise and the external\n                                    community. In general, the Data Access Service layer supports\n                                    inter-application integration and sharing of data and functions\n                                    that are maintained in separate application systems.\n\nDatabase                            A collection of information that is organized so that it can easily\n                                    be accessed, managed, and updated.\n\nData-Centric                        Refers to a focus on the specific data relevant to a given task.\n\n\n                                                                                               Page 32\n\x0c                           Customer Account Data Engine 2 Database Validation\n                           Is Progressing; However, Data Coverage, Data Defect\n                             Reporting, and Documentation Need Improvement\n\n\n\nTerm                                  Definition\n\nField Identifier (FLID)               An IRS file format that uses a numeric field (i.e., FLIDs) to\n                                      identify a data field.\n\nFLID Compare Tool (High               An automated tool that compares a high volume of taxpayer\nVolume)                               accounts (the business requirement is to compare 1 million tax\n                                      modules in 40 hours). The tool is intended to compare data in\n                                      the IMF and CADE 2.\n\nFiling Season                         The period from January through mid-April when most\n                                      individual income tax returns are filed.\n\nFinal Integration Testing             A system test consisting of integrated end-to-end testing of\n                                      mainline tax processing systems to verify that new releases of\n                                      interrelated systems and hardware platforms can collectively\n                                      support the IRS business functions allocated to them.\n\nGeneral Transcript Report             A report used by the Chief Financial Officer and Business\n                                      Modernization Office during data validation to compare the\n                                      corresponding data fields to ensure identical data.\n\nIndividual Master File                The IRS files that maintain transactions or records of individual\n                                      tax accounts.\n\nIndividual Master File Online         This provides online access to individual taxpayer returns.\n\nKnowledge, Incident/Problem,          An IRS application that maintains the complete inventory of\nService Asset Management              information technology and non\xe2\x80\x93information technology assets,\n                                      including computer hardware and software. It is also the\n                                      reporting tool for problem management with all IRS developed\n                                      applications, and shares information with the Enterprise Service\n                                      Desk.\n\nMilestone                             Provides for \xe2\x80\x9cgo/no-go\xe2\x80\x9d decision points in a project and are\n                                      sometimes associated with funding approval to proceed.\n\nPriority 1 Defect Ticket              An incident ticket issue exhibiting the following characteristics:\n                                      1) resulting in severe mission-critical work stoppage or any issue\n                                      relating to safety or health (e.g., fire, electrical shock);\n                                      2) affecting vital IRS customer commitments of national or\n                                      area-wide scope; 3) affecting multiple internal or external\n                                      customers and service to taxpayers; and 4) requiring immediate\n                                      action.\n\n\n                                                                                                Page 33\n\x0c                           Customer Account Data Engine 2 Database Validation\n                           Is Progressing; However, Data Coverage, Data Defect\n                             Reporting, and Documentation Need Improvement\n\n\n\nTerm                                  Definition\n\nPriority 2 Defect Ticket              An incident ticket issue with the potential to result in a work\n                                      stoppage and/or to lead to severe mission-critical work stoppage\n                                      if actions are not taken to resolve the incident.\n\nProduction Support Environment        A close replica of the IRS production environment used for\n                                      various activities such as performance testing and data\n                                      validation.\n\nRequirement                           A statement of capability or condition that a system, subsystem,\n                                      or system component must have or meet to satisfy a contract,\n                                      standard, or specification.\n\nRisk                                  A potential event that could have an unwanted impact on the\n                                      cost, schedule, business, or technical performance of an\n                                      information technology program, project, or organization.\n\nSmart Sample                          A sample of modules selected as a result of the Smart sampling\n                                      process, which is part of the CADE 2 data validation data\n                                      sampling methodology. The Smart sampling process will ensure\n                                      that infrequently seen data fields will be included in data\n                                      validation testing. It will also provide a statistical basis for\n                                      deciding how many instances of a particular data field or\n                                      business event are to be sampled based on the probability of\n                                      occurrence and target confidence level.\n\nStructured Query Language             A standard interactive and programming language for getting\n                                      information from and updating a database.\n\nSystems Acceptability Testing         Testing conducted to verify a system satisfies application\n                                      requirements.\n\nTransformation Logic Path             This is the value of a data field based on the transformation rule\n                                      conditions it meets.\n\nTransformation Rule                   A rule to set the value in a field in the CADE 2 database. It may\n                                      contain multiple conditions to decide the value of that field.\n                                      Each condition defines a logic path for the transformation.\n\n\n\n\n                                                                                                Page 34\n\x0c      Customer Account Data Engine 2 Database Validation\n      Is Progressing; However, Data Coverage, Data Defect\n        Reporting, and Documentation Need Improvement\n\n\n\n                                                  Appendix VI\n\nManagement\xe2\x80\x99s Response to the Draft Report\n\n\n\n\n                                                        Page 35\n\x0cCustomer Account Data Engine 2 Database Validation\nIs Progressing; However, Data Coverage, Data Defect\n  Reporting, and Documentation Need Improvement\n\n\n\n\n                                                  Page 36\n\x0cCustomer Account Data Engine 2 Database Validation\nIs Progressing; However, Data Coverage, Data Defect\n  Reporting, and Documentation Need Improvement\n\n\n\n\n                                                  Page 37\n\x0cCustomer Account Data Engine 2 Database Validation\nIs Progressing; However, Data Coverage, Data Defect\n  Reporting, and Documentation Need Improvement\n\n\n\n\n                                                  Page 38\n\x0cCustomer Account Data Engine 2 Database Validation\nIs Progressing; However, Data Coverage, Data Defect\n  Reporting, and Documentation Need Improvement\n\n\n\n\n                                                  Page 39\n\x0cCustomer Account Data Engine 2 Database Validation\nIs Progressing; However, Data Coverage, Data Defect\n  Reporting, and Documentation Need Improvement\n\n\n\n\n                                                  Page 40\n\x0cCustomer Account Data Engine 2 Database Validation\nIs Progressing; However, Data Coverage, Data Defect\n  Reporting, and Documentation Need Improvement\n\n\n\n\n                                                  Page 41\n\x0cCustomer Account Data Engine 2 Database Validation\nIs Progressing; However, Data Coverage, Data Defect\n  Reporting, and Documentation Need Improvement\n\n\n\n\n                                                  Page 42\n\x0cCustomer Account Data Engine 2 Database Validation\nIs Progressing; However, Data Coverage, Data Defect\n  Reporting, and Documentation Need Improvement\n\n\n\n\n                                                  Page 43\n\x0cCustomer Account Data Engine 2 Database Validation\nIs Progressing; However, Data Coverage, Data Defect\n  Reporting, and Documentation Need Improvement\n\n\n\n\n                                                  Page 44\n\x0cCustomer Account Data Engine 2 Database Validation\nIs Progressing; However, Data Coverage, Data Defect\n  Reporting, and Documentation Need Improvement\n\n\n\n\n                                                  Page 45\n\x0cCustomer Account Data Engine 2 Database Validation\nIs Progressing; However, Data Coverage, Data Defect\n  Reporting, and Documentation Need Improvement\n\n\n\n\n                                                  Page 46\n\x0cCustomer Account Data Engine 2 Database Validation\nIs Progressing; However, Data Coverage, Data Defect\n  Reporting, and Documentation Need Improvement\n\n\n\n\n                                                  Page 47\n\x0cCustomer Account Data Engine 2 Database Validation\nIs Progressing; However, Data Coverage, Data Defect\n  Reporting, and Documentation Need Improvement\n\n\n\n\n                                                  Page 48\n\x0c'